Pandas: Dealing with missing Datas

pandas
dataframe
This notebook demonstrates how to handle missing data in Pandas DataFrames. Each section below explains a different operation or concept.
Author

Mohammed Adil Siraju

Published

September 16, 2025

Import Libraries

Import Pandas and NumPy, which are essential for data manipulation and handling missing values.

import pandas as pd
import numpy as np

What is np.nan?

np.nan represents a missing value (“Not a Number”) in NumPy and Pandas.

np.nan
nan

Create DataFrame with Missing Values

This section creates a DataFrame containing missing values using np.nan.

data = {'A': [1,2,np.nan,4,5],
        'B': [6,np.nan,7,8,9],
        'C': [11,12,13,np.nan,15]
        }
df = pd.DataFrame(data)
df
A B C
0 1.0 6.0 11.0
1 2.0 NaN 12.0
2 NaN 7.0 13.0
3 4.0 8.0 NaN
4 5.0 9.0 15.0

Detect Missing Values

Use isnull() to check which values are missing in the DataFrame.

df.isnull()
A B C
0 False False False
1 False True False
2 True False False
3 False False True
4 False False False

Count Missing Values

Use isnull().sum() to count the number of missing values in each column.

df.isnull().sum()
A    1
B    1
C    1
dtype: int64

Drop Rows with Missing Values

Use dropna() to remove rows containing missing values from the DataFrame.

# Drops rows with na values
df.dropna(inplace=True)
# or
df = df.dropna()

View DataFrame After Dropping Rows

Display the DataFrame after removing rows with missing values.

df
A B C
0 1.0 6.0 11.0
4 5.0 9.0 15.0

Reset Index After Dropping Rows

Use reset_index(drop=True) to reset the DataFrame index after dropping rows.

df.reset_index(drop=True)
A B C
0 1.0 6.0 11.0
1 5.0 9.0 15.0

Create Another DataFrame with Missing Values

This section creates a new DataFrame with missing values for further operations.

data1 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,8,9],
        'C': [11,12,13,np.nan,15]
        }
df1 = pd.DataFrame(data1)
df1
A B C
0 1 6.0 11.0
1 2 NaN 12.0
2 3 7.0 13.0
3 4 8.0 NaN
4 5 9.0 15.0

Drop Columns with Missing Values

Use dropna(axis=1) to remove columns containing missing values from the DataFrame.

df1 = df1.dropna(axis=1)
df1
A
0 1
1 2
2 3
3 4
4 5

Create DataFrame for Threshold Example

This section creates a DataFrame to demonstrate dropping rows based on a threshold of non-missing values.

data2 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,np.nan,9],
        'C': [11,12,13,np.nan,15]
        }
df2 = pd.DataFrame(data2)
df2
A B C
0 1 6.0 11.0
1 2 NaN 12.0
2 3 7.0 13.0
3 4 NaN NaN
4 5 9.0 15.0

Drop Rows Based on Threshold

Use dropna(thresh=2) to keep only rows with at least 2 non-missing values.

df2 = df2.dropna(thresh=2)
df2
A B C
0 1 6.0 11.0
1 2 NaN 12.0
2 3 7.0 13.0
4 5 9.0 15.0

Fill Missing Values with Zero

Use fillna(0) to replace all missing values in the DataFrame with zero.

df2 = df2.fillna(0)
df2
A B C
0 1 6.0 11.0
1 2 0.0 12.0
2 3 7.0 13.0
4 5 9.0 15.0

Create DataFrame for Fill Methods

This section creates a DataFrame to demonstrate different methods for filling missing values.

data3 = {'A': [1,2,3,4,5],
        'B': [6,np.nan,7,np.nan,9],
        'C': [11,12,13,np.nan,15]
        }
df3 = pd.DataFrame(data2)
df3
A B C
0 1 6.0 11.0
1 2 NaN 12.0
2 3 7.0 13.0
3 4 NaN NaN
4 5 9.0 15.0

Fill Missing Values with Mean or Median

Use fillna(df.mean()) or fillna(df.median()) to replace missing values with the mean or median of each column.

df3.fillna(df3.mean())
df3.fillna(df3.median())
A B C
0 1 6.0 11.0
1 2 7.0 12.0
2 3 7.0 13.0
3 4 7.0 12.5
4 5 9.0 15.0

Fill Missing Values with Forward/Backward Fill

Use fillna(method='ffill') for forward fill and fillna(method='bfill') for backward fill to propagate non-missing values.

df3.fillna(method='ffill')
df3.fillna(method='bfill')
C:\Users\adila\AppData\Local\Temp\ipykernel_7712\3709391602.py:1: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df3.fillna(method='ffill')
C:\Users\adila\AppData\Local\Temp\ipykernel_7712\3709391602.py:2: FutureWarning: DataFrame.fillna with 'method' is deprecated and will raise in a future version. Use obj.ffill() or obj.bfill() instead.
  df3.fillna(method='bfill')
A B C
0 1 6.0 11.0
1 2 7.0 12.0
2 3 7.0 13.0
3 4 9.0 15.0
4 5 9.0 15.0